Sparse Bayesian hierarchical modeling of high-dimensional clustering problems

نویسنده

  • Heng Lian
چکیده

Clustering is one of the most widely used procedures in the analysis of microarray data, for example with the goal of discovering cancer subtypes based on observed heterogeneity of genetic marks between different tissues. It is wellknown that in such high-dimensional settings, the existence of many noise variables can overwhelm the few signals embedded in the high-dimensional space. We propose a novel Bayesian approach based on Dirichlet process with a sparsity prior that simultaneous performs variable selection and clustering, and also discover variables that only distinguish a subset of the cluster components. Unlike previous Bayesian formulations, we use Dirichlet process (DP) for both clustering of samples as well as for regularizing the high-dimensional mean/variance structure. To solve the computational challenge brought by this double usage of DP, we propose to make use of a sequential sampling scheme embedded within Markov chain Monte Carlo (MCMC) updates to im-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

Classification in Very High Dimensional Problems with Handfuls of Examples

Modern classification techniques perform well when the number of training examples exceed the number of features. If, however, the number of features greatly exceed the number of training examples, then these same techniques can fail. To address this problem, we present a hierarchical Bayesian framework that shares information between features by modeling similarities between their parameters. ...

متن کامل

Non-parametric Bayesian Hierarchical Factor Modeling and Regression

We address the problem of sparse Bayesian factor regression from high-dimensional gene-expression data where the number and inter-relationship of factors is not known apriori. We take a non-parametric Bayesian approach based on a variant of the Indian Buffet Process [1]. This leads to an interpretable model for gene-pathway relationships, a simple inference procedure, and allows us to consider ...

متن کامل

Semi-supervised Hierarchical Clustering Analysis for High Dimensional Data

In many data mining tasks, there is a large supply of unlabeled data but limited labeled data since it is expensive generated. Therefore, a number of semi-supervised clustering algorithms have been proposed, but few of them are specially designed for high dimensional data. High dimensionality is a difficult challenge for clustering analysis due to the inherent sparse distribution, and most of p...

متن کامل

یک روش مبتنی بر خوشه‌بندی سلسله‌مراتبی تقسیم‌کننده جهت شاخص‌گذاری اطلاعات تصویری

It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Multivariate Analysis

دوره 101  شماره 

صفحات  -

تاریخ انتشار 2010